GitHub

#GitHub| 来源: 网络整理| 查看: 265

databricks-formula1

##formula 1 race data

Entire data pipeline which includes intial silver,bronze and gold zones of data using azure databricks

The goal of my project was to simulate data flow in Databricks and gain hands-on experience working with data engineering and processing techniques. To accomplish this, I created a sample data pipeline with various stages of data processing, including ingesting data from different sources like CSV, JSON, and Parquet files.

To ensure data quality and accuracy, I performed data cleansing and validation, followed by transforming the data using filtering, aggregating, and joining techniques. I also optimized query performance by partitioning the data based on specific columns.

Throughout the project, I used Delta Lake to handle incremental loads and manage data reliability. In addition, I used temporary and global views to easily access and manipulate the data using SQL-like syntax, which helped me to perform complex queries and transformations on the data.

Overall, this project gave me practical experience with data engineering and processing techniques, as well as Databricks-specific features like Delta Lake and temporary views.

【本文地址】

GitHub

GitHub

今日新闻

推荐新闻